Computer-Assisted Categorization of Patent Documents in the International Patent Classification
نویسندگان
چکیده
The World Intellectual Property Organization is currently developing a system for assisting users in categorizing patent documents in the International Patent Classification (IPC). The system should support the classification of documents in several languages and aims to assist users in locating relevant IPC symbols by providing them with a convenient web-based service. The approach taken for developing such a system relies on powerful machine learning algorithms that are trained on manually classified documents to recognize IPC topics. We detail in-house results of applying a custom-built state-of-the-art computer-assisted categorizer to English, French, Russian, and Germanlanguage patent documents. We find that reliable computer-assisted categorization at IPC subclass level is an achievable goal for the statistical methods employed here. A categorization system suggesting three IPC symbols for each document can predict the main IPC class correctly for around 90% of documents, and the main IPC subclass for about 85% of documents. The accuracy of the system at main group level is enhanced if the user first validates the correct IPC class.
منابع مشابه
Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed
Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords...
متن کاملPatent document categorization based on semantic structural information
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structu...
متن کاملDevelopment of a patent document classification and search platform using a back-propagation network
In order to process large numbers of explicit knowledge documents such as patents in an organized manner, automatic document categorization and search are required. In this paper, we develop a document classification and search methodology based on neural network technology that helps companies manage patent documents more effectively. The classification process begins by extracting key phrases...
متن کاملText Categorization for Intellectual Property Comparing Balanced Winnow with SVM on Different Document Representations
This study investigates the effect of training different categorization algorithms on various patent document representations. The automation of knowledge and content management in the intellectual property domain has been experiencing a growing interest in the last decade, since the first patent classification system was presented in 1999 by Larkey [Larkey, 1999]. Typical applications of paten...
متن کاملEnhancing Patent Expertise through Automatic Matching with Scientific Papers
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international patents classification plan related to the same field. The pract...
متن کامل